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Abstract 

From the spectral plot of the (normalized) graph Laplacian, the essential qual- 
itative properties of a network can be simultaneously deduced. Given a class of 
empirical networks, reconstruction schemes for elucidating the evolutionary dy- 
namics leading to those particular data can then be developed. This method is 
exemplified for protein-protein interaction networks. Traces of their evolutionary 
history of duplication and divergence processes are identified. In particular, we 
can identify typical specific features that robustly distinguish protein-protein in- 
teraction networks from other classes of networks, in spite of possible statistical 
fluctuations of the underlying data. 

1 Introduction 

In recent years, many studies have investigated certain important parameters for empir- 
ical networks, such as degree distribution, average path length, diameter, betweenness 
centrality, transitivity or clustering coefficient etc. Such studies could identify certain 
rather universal features valid for networks across a wide range of disciplines, like 
scalefree degree distributions. Conversely, on this basis, often algorithms could be 
developed that, perhaps after fitting certain free parameters, could construct networks 
with the same qualitative properties and values for such variables. 
Here, we look at an essentially complete set of graph variables, given by the spectrum 
of its normalized Laplacian. On this basis, we can then develop algorithms that con- 
struct networks with all the essential qualitative properties as the ones in a given data 
set. For biological networks, we can thereby retrace the regularities in their evolution- 
ary history. Here, we demonstrate this principle and apply this method for protein- 
protein interaction networks (PPIN for short). We detect indications of an evolutionary 
of duplication and divergence, as argued in flTl lTl. 

This approach then also sheds light on a somewhat different issue, namely which fea- 
tures and properties are distinctive for networks from particular empirical classes, as 
opposed to universal features shared across classes. 
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2 The normalized Laplacian and its spectrum 



We model a network as a graph F with N vertices or nodes. Two vertices i,j e F 
are called neighbors, i ~ j, when they are connected by an edge of F. For a vertex 
i € F, let Tii be its degree, that is, the number of its neighbors. For functions v from 
the vertices of F to M, we define the (normalized) Laplacian as 

Av{t):^v{i)^-y^v{j). (1) 

This is different from the algebraic graph Laplacian usually studied in the graph theo- 
retical literature, see e.g. [3|, but equivalent to the Laplacian investigated in ||5|. This 
normalized Laplacian is, for example, the operator underlying random walks on graphs, 
and in contrast to the algebraic Laplacian, it naturally incorporates a conservation law. 
The spectrum, that is, the collection of eigenvalues of A, yields important invariants 
of the underlying graph F that incorporate its qualitative properties, for example, how 
difficult it is to decompose the graph, or how different it is from a bipartite graph, that 
is, one with two types of vertices where connections are only permitted between ver- 
tices of different type (see (31). Also, the spectrum controls the behavior of dynamical 
processes supported by the network (see [12, 11|). One can essentially recover the 
graph from its spectrum (for a heuristic algorithm, see lO), up to isospectral graphs. 
The latter are known to exist, but are relatively rare and qualitatively quite similar in 
most respects. 

The multiplicity nii of the eigenvalue 1 of A is particularly significant, nii is the 
number of linearly independent solutions of Av(i) = v{i) for all i, that is, of 

v{j) for all i. (2) 

(Equivalently, nii is the dimension of the kernel of the adjacency matrix of F.) - Such 
functions can be created by node duplication: Take any node io G F and form a new 
graph Fq by adding a new node jo to F and connecting it to all neighbors of io. Thus, 
in Fq, io and Jq have the same neighbors. A solution t; of (|2]i on Fq then is obtained by 
putting f (io) = 1, v{jo) — —1 and v{i) — for all other nodes i. In other words, node 
duplication increases mi by 1 . For this reason, it constitutes an important invariant for 
our investigation of protein-protein interaction networks. - In a similar vein, doubling 
an edge that connects vertices pi , p2 produces the eigenvalues A = 1 ± which 
are symmetric about 1, and close to 1 when the degrees are sufficiently large. - Also, if 
we duplicate a particular node m times, then the number of specific motifs containing 
that node will grow like (™); again that then is something that can easily be detected 
in given network data. 
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Figure 1: 

3 Spectral plot and structural analysis of protein-protein 
interaction networks 

In spite of their rather wide range of sizes and in spite of possible statistical fluctuations 
affecting the acccuracy of the underlying data, the spectral plots of the different PPIN^ 
share a particular pattern (FigllJ the spectral density is given as a sum of Lorentz dis- 
tributions, p{X) — X^felTi^ (Afc-A')^+7^ "^^^^ width 7 = .08 where Ai, . . . , Aat^i are 
the nonzero eigenvalues). The most prominent feature is the sharp peak around the 
eigenvalue 10 Also, the large degree of symmetry around 1 is noteworthy. - As a 
control, the various important structural parameters also have typical ranges; examples 
are, N being the size of the network: Maximum degree < j^, 1.56iV < Number of 
edges < 1.97iV, 0.307iV < mi < 0.445iV, 0.015 < Transitivity (relative frequency 
of vertex triangles) < 0.028. 

In particular, the multiplicity mi of the eigenvalue 1 and the transitivity are much 
larger than in random graphs of Erdos-Renyi type with a similar number of vertices 
and edges. Similar observations hold for small motifs, that is, subgraphs of a particular 
type, like cyclic chains of 4 vertices or structures where 3 vertices do not have direct 
connections, but are connected each to a central 4th vertex (data not shown). 



4 Model and network reconstruction 

On the basis of the spectral analysis, a constructive model for the evolution of a PPIN 
network can be proposed. The criterion is that the model reproduce all the essen- 
tial spectral features of the data class. Our constructive model for PPINs is inspired 

'See data source for details. 

high multiplicity of eigenvalue 1 has also been observed in other networks, like the Internet [16j. 
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by general evolutionary considerations. The basic evolutionary processes for growth 
and evolution of PPINs are duplication of protein (nodes) and mutation of connections 
(edges). 

Instead of cross links between the old protein and its duplicated copy - which would 
produce too small values for the transitivity -, a low probability preference for 2nd 
order neighbors as recipients of new connections is assumed. New connections with 
other proteins then occur with a different probability. Since in link dynamics, attach- 
ment occurs preferentially towards partners of high connectivity 12J, some preferential 
attachment to proteins with higher connections is included. In contrast, deletion is 
random with a uniform probability. 

Since genome evolution analysis ifTTl |3 on one hand supports the idea that the 
divergence of duplicated genes takes place shortly after the duplication, but on the 
other hand only indirect evidence is available for rapid functional divergence after gene 
duplication 111 71 . we have considered two different mutation processes: 

1 . A random deletion process that is independent of the duplication process occurs 
uniformly with probability 6, and two different kind of addition processes with 
preference towards a partner with high degree. 

(a) Connection with protein i at distance 2 with probability j^^ai , where 
di is the degree of protein i and ai is a parameter. 

(b) Connection with another protein i (that could even be in another compo- 
nent) with probability ■^^a2, with a parameter a2- 

2. A deletion with probability 6' that occurs for ^ of the duplications and shortly 
after such a duplication. This process operates by elimination of one of the two 
interactions in each redundant interaction pair of two duplicate proteins with 
equal probability. For simplicity, there is no addition for this mutation process. 

To make the duplication process independent of the first mutation process and to 
make the duplication rate lower than the mutation rate, duplication occurs with proba- 
bility P^jiip and with a preference that is the inverse of the square-root of the degree of 
the protein. 

A component of the network can grow by duplication of proteins within that com- 
ponent or attachment of other components or isolated proteins. 

Here, we have neglected isolated proteins, but the model can be readily extended by 
attachment of isolated proteins with some probability P^^^- One might also include 
a mechanism for cross link connections between duplicate protein pairs with some 
probability Pqi^IjiJ^, but the same effect can be achieved by tuning the other parameters. 

The algorithm starts with a small seed network of two linked proteins. The growth 
procedure is run until the giant component reaches our desired network size. 100 rep- 
etitions are performed with parameter values P^np — 0.15, 6' — 0.7, 5 — 0.00025, 
ai = 0.00008, a2 = 0.0002, P^^^ = 0.025, PcLink = O-^O^. 

The structural properties of the resulting giant component (size « 500) are: Max- 
imum degree « 43.69, Number of edges « 712.97, mi « 161.07, Transitivity w 
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Figure 2: 

0.02793. 

Thus, the spectral plot (Fig|2]) and the structural properties of the giant component 
of the simulated network match the real PPIN data closelyH 



A comparison with generic network construction algorithms shows that they nec- 
essarily important structural properties that are characteristic for PPIN networks and 
distinguish them from networks from other biological or nonbiological realms. Promi- 
nent examples of such generic schemes are a regular network, the random network 
of Erdos-Renyi0, the scalefree network construction by preferential attachment of 
Barabasi-Albert[ 1 1, and the small-world network by random rewiring of a regular net- 
work of Watts-Strogatz| 19|. Spectral plots of such networks, with the corresponding 
parameters adjusted to match the ones found for PPIN networks and constructed by the 
same scheme as in our algorithm, are obviously qualitatively different from the ones for 
the real data and our reconstructed network (see Figl3]l. This indicates that our spectral 
analysis uncovers features that are specific for PPIN networks. 

Other previous reconstruction schemes (lOl fT^lfTsjl ) typically focus on certain indi- 
vidual parameters in distinction to our emphasis on the entire spectrum. Consequently, 
the spectral plots are also different (details not shown). The model of |15 | includes a 
parameter p that incorporates the probability of cross interactions between the old pro- 
tein and its duplicated copy, for example resulting from self-interactions of the old one. 
A realistic value of p can then be determined from the data in ifTTl [Tsll and is smaller 
than 0.018. That upper bound is the value employed in 1J_5 |, but this scheme, for exam- 
ple, leads to too small a value for the transitivity of the giant cluster. Therefore, in our 

'The spectrum of the Laplacian is always confined between and 2. This is not quite exhibited by our 
spectral plots, due to the positive width of the kernel employed in our visualization. 
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(a) (b) (c) 

Figure 3: Spectral plots of (a) a random network by the Erdos- Renyi model |6J with 
p = 0.05, (a) a small-world network by the Watts-Strogatz model |fT9l (rewiring a 
regular ring lattice of average degree 4 with rewiring probability 0.3), (c) a scale-free 
network by the Barabasi-Albert model [1| (mo = 5 and m = 3). Size of all networks 
is 500. All figures are plotted with 100 realizations. 

model we assumed that, with some low probability, there is a preference for a protein 
to make new connection with its 2nd neighbors. 

Data Sources 

The protein protein interaction data sets for Saccharomyces cerevisiae^ (yeast) are from 
|http;//www.nd.edu/^networks/, used in [10] [download date: 17th September, 2004]. 
The ones for Escherichia coli as used in ||4|, Caenorhabditis elegans, Helicobacter 
pylori and, as a check, a second data set for Saccharomyces cerevisiae^ are taken from 
jhttp;//www.cosin.org/ [download date: 25th September, 2005]. Note that these two 
data sets on the same cell are quite different. This indicates the robustness of our 
method in view of possibly significant statistical fluctuations of the data employed. - 
Our analysis has been always performed on the giant components of these networks so 
as to work with connected graphs, and we have neglected the many small components 
and isolated proteins. 
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